Timer-S1 is a scalable Mixture-of-Experts time series model with 8.3B parameters that uses serial scaling and novel TimeMoE blocks to improve long-term forecasting accuracy.
We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias. We further pioneer a post-training stage, including continued pre-training and long-context extension, to enhance short-term and long-context performance. Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance, attaining the best MASE and CRPS scores as a pre-trained model. Timer-S1 will be released to facilitate further research.
This tutorial explores how to use LLM embeddings as features in time series forecasting models. It covers generating embeddings from time series descriptions, preparing data, and evaluating the performance of models with and without LLM embeddings.
Cisco and Splunk have introduced the Cisco Time Series Model, a univariate zero shot time series foundation model designed for observability and security metrics. It is released as an open weight checkpoint on Hugging Face.
* **Multiresolution data is common:** The model handles data where fine-grained (e.g., 1-minute) and coarse-grained (e.g., hourly) data coexist, a typical pattern in observability platforms where older data is often aggregated.
* **Long context windows are needed:** It's built to leverage longer historical data (up to 16384 points) than many existing time series models, improving forecasting accuracy.
* **Zero-shot forecasting is desired:** The model aims to provide accurate forecasts *without* requiring task-specific fine-tuning, making it readily applicable to a variety of time series datasets.
* **Quantile forecasting is important:** It predicts not just the mean forecast but also a range of quantiles (0.1 to 0.9), providing a measure of uncertainty.
A step-by-step guide to catching real anomalies without drowning in false alerts.
This paper provides a theoretical analysis of Transformers' limitations for time series forecasting through the lens of In-Context Learning (ICL) theory, demonstrating that even powerful Transformers often fail to outperform simpler models like linear models. The study focuses on Linear Self-Attention (LSA) models and shows that they cannot achieve lower expected MSE than classical linear models for in-context forecasting, and that predictions collapse to the mean exponentially under Chain-of-Thought inference.
This article explores how prompt engineering can be used to improve time-series analysis with Large Language Models (LLMs), covering core strategies, preprocessing, anomaly detection, and feature engineering. It provides practical prompts and examples for various tasks.
This article details a hands-on approach to modeling rare events in time series data using Python. It covers data exploration, defining extreme events, fitting distributions (GEV, Weibull, Gumbel), and evaluating model performance using metrics like log-likelihood, AIC, and BIC. The example uses weather data and provides code snippets for implementation.
An article discussing the importance of time series databases and data visualization tools like Grafana for managing and interpreting streams of data in various applications.
The author mentions several time series databases (TSDs) and visualization tools, focusing on their features, advantages, and some limitations. The article also provides an example of a Building Management and Control (BMaC) project that uses InfluxDB and Grafana for data visualization.
| Database | Description | Notable Features |
|-------------------|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
| InfluxDB | Partially open source, with version 3 being an edge data collector. | Shard-based storage, compaction levels, time series index, optional retention. |
| Apache Kudu | Column-based database optimized for multidimensional OLAP workloads. | Part of the Apache Hadoop ecosystem. |
| Prometheus | Developed at SoundCloud for metrics monitoring. | Written in Go, similar to InfluxDB v1 and v2. |
| RRDTool | All-in-one package with a circular buffer TSD that also does graphing. | Language bindings for various programming languages. |
| Graphite | Similar to RRDTool but uses a Django web-based application to render graphs. | Web-based graphing. |
| TimescaleDB | Extends PostgreSQL, supporting typical SQL queries with TSD functionality and optimizations. | Supports all typical SQL queries. |
The article also discusses Grafana as a popular tool for creating dashboards to visualize time series data, mentioning its compatibility with multiple TSDs and SQL databases. It concludes by highlighting the importance of understanding one's specific needs before choosing a TSD and visualization solution.
IBM’s new foundation model, TSPulse, can go beyond standard forecasting tasks to detect anomalies, fill in missing values, classify data, and search recurring patterns. It’s also tiny enough to run on a laptop.
This article demonstrates how to use the attention mechanism in a time series classification framework, specifically for classifying normal sine waves versus 'modified' (flattened) sine waves. It details the data generation, model implementation (using a bidirectional LSTM with attention), and results, achieving high accuracy.